智能论文笔记

SW-VAE: Weakly Supervised Learn Disentangled Representation Via Latent Factor Swapping

Jiageng Zhu , Hanchen Xie , Wael Abd-Almageed

分类：机器学习 | 人工智能

2022-09-21

表示解开是表示有利于各种下游任务的代表性学习的重要目标。为了实现这一目标，已经开发了许多无监督的学习表示方法。但是，事实证明，没有使用任何监督信号的培训过程就不足以进行分解表示。因此，我们提出了一种新型的弱监督训练方法，称为SW-VAE，该方法通过使用数据集的生成因子，将成对的输入观测值作为监督信号。此外，我们引入了策略，以逐渐增加训练过程中的学习难度，以使训练过程平滑。如多个数据集所示，我们的模型对表示解散任务的最新方法（SOTA）方法显示出显着改善。

translated by 谷歌翻译

Weakly Supervised Invariant Representation Learning Via Disentangling Known and Unknown Nuisance Factors

Jiageng Zhu , Hanchen Xie , Wael Abd-Almageed

分类：机器学习 | 人工智能

2022-09-15

散布和不变的表示是代表学习的两个关键目标，并且已经提出了许多方法来实现其中的一个。但是，这两个目标实际上是相互补充的，因此我们提出了一个框架，以同时完成两个目标。我们引入了一个弱监督的信号，以学习解开表示的表示，该表示由三个拆分组成，分别包含预测性，已知滋扰和未知的滋扰信息。此外，我们结合了对比度的实施表示不变性的方法。实验表明，所提出的方法在四个标准基准上优于最先进的方法（SOTA）方法，并表明该方法可以具有更好的对抗性防御能力，而没有对抗训练的其他方法。

translated by 谷歌翻译

Do-Operation Guided Causal Representation Learning with Reduced Supervision Strength

Jiageng Zhu , Hanchen Xie , Wael AbdAlmageed

分类：机器学习 | 人工智能

2022-06-03

Causal representation learning has been proposed to encode relationships between factors presented in the high dimensional data. However, existing methods suffer from merely using a large amount of labeled data and ignore the fact that samples generated by the same causal mechanism follow the same causal relationships. In this paper, we seek to explore such information by leveraging do-operation to reduce supervision strength. We propose a framework that implements do-operation by swapping latent cause and effect factors encoded from a pair of inputs. Moreover, we also identify the inadequacy of existing causal representation metrics empirically and theoretically and introduce new metrics for better evaluation. Experiments conducted on both synthetic and real datasets demonstrate the superiorities of our method compared with state-of-the-art methods.

translated by 谷歌翻译

Combining Photogrammetric Computer Vision and Semantic Segmentation for Fine-grained Understanding of Coral Reef Growth under Climate Change

Jiageng Zhong , Ming Li , Hanqi Zhang , Jiangying Qin

分类：计算机视觉

2022-12-08

Corals are the primary habitat-building life-form on reefs that support a quarter of the species in the ocean. A coral reef ecosystem usually consists of reefs, each of which is like a tall building in any city. These reef-building corals secrete hard calcareous exoskeletons that give them structural rigidity, and are also a prerequisite for our accurate 3D modeling and semantic mapping using advanced photogrammetric computer vision and machine learning. Underwater videography as a modern underwater remote sensing tool is a high-resolution coral habitat survey and mapping technique. In this paper, detailed 3D mesh models, digital surface models and orthophotos of the coral habitat are generated from the collected coral images and underwater control points. Meanwhile, a novel pixel-wise semantic segmentation approach of orthophotos is performed by advanced deep learning. Finally, the semantic map is mapped into 3D space. For the first time, 3D fine-grained semantic modeling and rugosity evaluation of coral reefs have been completed at millimeter (mm) accuracy. This provides a new and powerful method for understanding the processes and characteristics of coral reef change at high spatial and temporal resolution under climate change.

translated by 谷歌翻译

GreenPLM: Cross-lingual pre-trained language models conversion with (almost) no cost

Qingcheng Zeng , Lucas Garay , Peilin Zhou , Dading Chong , Yining Hua , Jiageng Wu , Yikang Pan , Han Zhou , Jie Yang

分类：自然语言处理

2022-11-13

While large pre-trained models have transformed the field of natural language processing (NLP), the high training cost and low cross-lingual availability of such models prevent the new advances from being equally shared by users across all languages, especially the less spoken ones. To promote equal opportunities for all language speakers in NLP research and to reduce energy consumption for sustainability, this study proposes an effective and energy-efficient framework GreenPLM that uses bilingual lexicons to directly translate language models of one language into other languages at (almost) no additional cost. We validate this approach in 18 languages and show that this framework is comparable to, if not better than, other heuristics trained with high cost. In addition, when given a low computational cost (2.5\%), the framework outperforms the original monolingual language models in six out of seven tested languages. We release language models in 50 languages translated from English and the source code here.

translated by 谷歌翻译

YATO: Yet Another deep learning based Text analysis Open toolkit

Zeqiang Wang , Yile Wang , Jiageng Wu , Zhiyang Teng , Jie Yang

分类：自然语言处理

2022-09-28

我们介绍了Yato，这是一种开源工具包，用于文本分析，并深入学习。它着重于文本上的基本序列标签和序列分类任务。Yato在层次结构中设计，支持三种功能的免费组合，包括1）传统神经网络（CNN，RNN等）；2）预训练的语言模型（Bert，Roberta，Electra等）；3）通过简单的可配置文件，用户定制的神经功能。Yato受益于灵活性和易用性的优势，可以促进对最先进的NLP模型的再现和完善，并促进NLP技术的跨学科应用。源代码，示例和文档可在https://github.com/jiesutd/yato上公开获取。

translated by 谷歌翻译

METS-CoV: A Dataset of Medical Entity and Targeted Sentiment on COVID-19 Related Tweets

Peilin Zhou , Zeqiang Wang , Dading Chong , Zhijiang Guo , Yining Hua , Zichang Su , Zhiyang Teng , Jiageng Wu , Jie Yang

分类：自然语言处理

2022-09-28

Covid-19-Pandemic继续在社交媒体上提出各种讨论或辩论的主题。为了探索大流行对人们生活的影响，了解公众对与大流行有关的实体（例如药物，疫苗）对社交媒体的关注和态度至关重要。但是，对现有命名实体识别（NER）或目标情感分析（TSA）数据集培训的模型具有有限的理解与COVID相关的社交媒体文本的能力有限，因为这些数据集并未从医学角度设计或注释。本文释放了Mets-COV，这是一种包含医疗实体的数据集和与COVID相关的推文中的目标情感。 Mets-COV包含10,000条带有7种实体的推文，包括4种医疗实体类型（疾病，药物，症状和疫苗）和3种通用实体类型（人，位置和组织）。为了进一步调查推文用户对特定实体的态度，选择了4种类型的实体（人，组织，药物和疫苗），并用用户情感注释，从而产生了具有9,101个实体（5,278个推文）的目标情感数据集。据我们所知，METS-COV是第一个收集与COVID相关推文的医疗实体和相应情感的数据集。我们通过广泛的实验对经典机器学习模型和最先进的深度学习模型进行基准测试。结果表明，该数据集在NER和TSA任务方面都有大量改进的空间。 METS-COV是开发更好的医学社交媒体工具并促进计算社会科学研究的重要资源，尤其是在流行病学方面。我们的数据，注释准则，基准模型和源代码公开可用（https://github.com/ylab-open/mets-cov），以确保可重复性。

translated by 谷歌翻译

3D Object Detection for Autonomous Driving: A Review and New Outlooks

Jiageng Mao , Shaoshuai Shi , Xiaogang Wang , Hongsheng Li

分类：计算机视觉 | 人工智能 | 机器人

2022-06-19

近年来，自主驾驶一直在受到越来越多的关注，因为它的潜力减轻了驾驶员的负担并提高驾驶的安全性。在现代的自动驾驶管道中，感知系统是必不可少的组件，旨在准确估计周围环境的状态，并为预测和计划提供可靠的观察。 3D对象检测可以智能预测自动驾驶汽车附近关键3D对象的位置，大小和类别，是感知系统的重要组成部分。本文回顾了自动驾驶的3D对象检测的进展。首先，我们介绍3D对象检测的背景，并讨论此任务中的挑战。其次，我们从模型和感觉输入的各个方面（包括基于激光雷达，基于摄像头和多模式检测方法）对3D对象检测的进度进行了全面调查。我们还对每类方法中的潜力和挑战提供了深入的分析。此外，我们系统地研究了3D对象检测在驾驶系统中的应用。最后，我们对3D对象检测方法进行了性能分析，并进一步总结了多年来的研究趋势，并向前景提供了该领域的未来方向。

translated by 谷歌翻译

SODA10M: A Large-Scale 2D Self/Semi-Supervised Object Detection Dataset for Autonomous Driving

Jianhua Han , Xiwen Liang , Hang Xu , Kai Chen , Lanqing Hong , Jiageng Mao , Chaoqiang Ye , Wei Zhang , Zhenguo Li , Xiaodan Liang

分类：计算机视觉

2021-06-21

旨在促进现实世界，不断发展和可扩展的自主驾驶系统，我们展示了一个大规模数据集，用于通过从原始数据学习来标准化不同自我监督和半监督方法的评估，这是第一和最大的数据集到期。现有的自主驱动系统严重依赖于“完善”视觉感知模型（即，检测）使用广泛的注释数据培训，以确保安全性。然而，在部署强大的自动驾驶系统时，精致地标记所有情景和环境的实例（即夜，极端天气，城市）是不现实的。最近的自我监督和半监督学习的推进激励，希望通过协作利用大规模未标记的数据和少数标记数据来学习强大的检测模型。现有数据集只提供少量数据或涵盖具有完整注释的有限域，妨碍大规模预训练模型的探索。在这里，我们发布了一个大型2D自主/半监控的对象检测数据集，用于自动驾驶，名为SODA10M，其中包含1000万个未标记的图像和标有6个代表对象类别的20K图像。为了提高多样性，在不同天气条件下的27833个驾驶时间内收集图像，32个不同城市的时期和位置场景。我们提供广泛的实验和对现有的流行自主/半监督方法深度分析，并在自动驾驶范围内给出一些有趣的调查结果。实验表明，SODA10M可以作为不同的自我监督学习方法作为有前途的预训练数据集，这在微调驾驶域中的不同下游任务（即检测，语义/实例分段）进行微调时提供了卓越的性能。更多信息可以参考https://soda-2d.github.io。

translated by 谷歌翻译

A Survey on Deep Learning-based Single Image Crowd Counting: Network Design, Loss Function and Supervisory Signal

Haoyue Bai , Jiageng Mao , S. -H. Gary Chan

分类：计算机视觉

2020-12-31

单图像人群计数是一个充满挑战的计算机视觉问题，在公共安全，城市规划，交通管理等方面进行了广泛的应用。随着深度学习技术的最新发展，近年来，人群的数量引起了很多关注并取得了巨大的成功。这项调查是为了通过系统审查和总结该地区的200多件作品来提供有关基于深度学习的人群计数技术的最新进展的全面摘要。我们的目标是提供最新的评论。在最近的方法中，并在该领域教育新研究人员的设计原理和权衡。在介绍了公开可用的数据集和评估指标之后，我们通过对三个主要的设计模块进行了详细比较来回顾最近的进展：深度神经网络设计，损失功能和监督信号。我们使用公共数据集和评估指标研究和比较方法。我们以一些未来的指示结束了调查。

translated by 谷歌翻译